-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove IterDataPipe from Inference pipeline #96
Conversation
Important Review skippedAuto reviews are disabled on base/target branches other than the default branch. Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the WalkthroughThe pull request introduces significant changes across multiple files, primarily focusing on updating the data provider from Changes
Possibly related PRs
Suggested reviewers
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #96 +/- ##
==========================================
+ Coverage 96.64% 97.44% +0.79%
==========================================
Files 23 37 +14
Lines 1818 3559 +1741
==========================================
+ Hits 1757 3468 +1711
- Misses 61 91 +30 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're still using LabelsReaderDP
in a bunch of places in the test, but I think we want to test the new LabelsReader
, no?
2d2c737
to
1e83d76
Compare
cf83eee
to
0da9ad2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Caution
Inline review comments failed to post
Actionable comments posted: 27
🧹 Outside diff range and nitpick comments (26)
tests/inference/test_bottomup.py (3)
3-7
: Consider removing unused importsThe imports for
Path
andshutil
appear to be unused in the visible code. If they are not used elsewhere in this file, consider removing them to keep the imports clean and relevant.If these imports are indeed unnecessary, you can apply the following diff:
-from pathlib import Path -import shutil import sleap_io as sio from sleap_nn.data.providers import process_lf from sleap_nn.data.normalization import apply_normalization🧰 Tools
🪛 Ruff
3-3:
pathlib.Path
imported but unusedRemove unused import:
pathlib.Path
(F401)
4-4:
shutil
imported but unusedRemove unused import:
shutil
(F401)
36-38
: Approved: Direct data loading and processingThe new approach of loading labels with
sleap_io
and processing them directly aligns well with the PR objective of removing IterDataPipe. This change simplifies the test setup and makes it more straightforward.For improved clarity, consider adding a brief comment explaining the purpose of the
unsqueeze
operation:labels = sio.load_slp(minimal_instance) ex = process_lf(labels[0], 0, 2) -ex["image"] = apply_normalization(ex["image"]).unsqueeze(dim=0) +# Add batch dimension to match model input requirements +ex["image"] = apply_normalization(ex["image"]).unsqueeze(dim=0)
Line range hint
67-102
: Approved: Comprehensive inference testingThe updated test performs inference directly on the processed example and includes comprehensive assertions to verify the output structure and dimensions. This aligns well with the PR objective and ensures the inference model is working as expected.
To improve readability, consider grouping related assertions and adding comments to explain their purpose:
output = inference_layer(ex)[0] -assert "confmaps" not in output.keys() -assert output["pred_instance_peaks"].is_nested -assert tuple(output["pred_instance_peaks"][0].shape)[1:] == (2, 2) -assert tuple(output["pred_peak_values"][0].shape)[1:] == (2,) +# Verify that confmaps are not returned when not requested +assert "confmaps" not in output.keys() + +# Check structure and dimensions of predicted peaks +assert output["pred_instance_peaks"].is_nested +assert tuple(output["pred_instance_peaks"][0].shape)[1:] == (2, 2) +assert tuple(output["pred_peak_values"][0].shape)[1:] == (2,) # ... (code for second inference) ... output = inference_layer(ex)[0] -assert tuple(output["confmaps"].shape) == (1, 2, 192, 192) -assert tuple(output["part_affinity_fields"].shape) == (1, 96, 96, 2) -assert output["pred_instance_peaks"].is_nested -assert output["peaks"][0].shape[-1] == 2 -assert tuple(output["pred_instance_peaks"][0].shape)[1:] == (2, 2) -assert tuple(output["pred_peak_values"][0].shape)[1:] == (2,) +# Verify dimensions of returned confmaps and part affinity fields +assert tuple(output["confmaps"].shape) == (1, 2, 192, 192) +assert tuple(output["part_affinity_fields"].shape) == (1, 96, 96, 2) + +# Check structure and dimensions of predicted peaks +assert output["pred_instance_peaks"].is_nested +assert output["peaks"][0].shape[-1] == 2 +assert tuple(output["pred_instance_peaks"][0].shape)[1:] == (2, 2) +assert tuple(output["pred_peak_values"][0].shape)[1:] == (2,)tests/data/test_resizing.py (2)
19-19
: Improve variable naming for better readability.While the change to use
LabelsReaderDP
is correct, the variable namel
is ambiguous and could be improved for better code readability.Consider renaming the variable to something more descriptive, like
labels_reader
:- l = LabelsReaderDP.from_filename(minimal_instance) - pipe = SizeMatcher(l, provider=l) + labels_reader = LabelsReaderDP.from_filename(minimal_instance) + pipe = SizeMatcher(labels_reader, provider=labels_reader)🧰 Tools
🪛 Ruff
19-19: Ambiguous variable name:
l
(E741)
52-52
: Consistently improve variable naming throughout the file.The ambiguous variable name
l
is used again. For consistency and improved readability, consider updating this and all similar occurrences in the file.Apply this change here and in other similar instances:
- l = LabelsReaderDP.from_filename(minimal_instance) + labels_reader = LabelsReaderDP.from_filename(minimal_instance)🧰 Tools
🪛 Ruff
52-52: Ambiguous variable name:
l
(E741)
tests/data/test_confmaps.py (1)
110-110
: LGTM: Consistent update to data pipeline initialization.The change from
LabelsReader
toLabelsReaderDP
is consistent with the previous modifications and aligns with the PR objectives.The static analysis tool flagged an unused import of
numpy
. Consider removing this import if it's not used elsewhere in the file:-import numpy as np
tests/data/test_edge_maps.py (2)
Line range hint
193-199
: LGTM! Consider adding tests for thread-based functionality.The change from
LabelsReader
toLabelsReaderDP
is consistent with the import statement modification and aligns with the PR objective of removing IterDataPipe.While the current test covers the basic functionality, consider adding specific tests for the thread-based approach mentioned in the PR objectives. This would ensure that the new implementation behaves correctly in a multi-threaded environment.
Example test cases could include:
- Concurrent access to LabelsReaderDP from multiple threads
- Performance comparison between the old and new implementations
- Thread safety of the LabelsReaderDP methods
Would you like assistance in drafting these additional test cases?
Line range hint
1-199
: Summary: Changes align with PR objectives, but thread-based approach not visible.The modifications in this file successfully replace
LabelsReader
withLabelsReaderDP
, which aligns with the PR objective of removing IterDataPipe. However, the thread-based approach mentioned in the PR objectives is not visible in these changes.Consider the following suggestions for further development:
- Implement and test the thread-based approach mentioned in the PR objectives.
- Update the docstring of the
test_part_affinity_fields_generator
function to reflect the use ofLabelsReaderDP
.- If
LabelsReaderDP
has any new features or behaviors different fromLabelsReader
, consider adding new test cases to cover these aspects.These additions would ensure that the new implementation is thoroughly tested and documented, improving the overall quality and maintainability of the codebase.
sleap_nn/data/pipelines.py (2)
255-255
: LGTM with a minor suggestionThe changes look good and align with the PR objective of removing the IterDataPipe dependency. The update to
LabelsReaderDP
in the docstring and the addition ofcentroids_confidence_maps
to thekeep_keys
list are appropriate.Consider adding a brief comment explaining the purpose of the
centroids_confidence_maps
key, especially if it's a new addition to the pipeline output. This would improve code readability and maintainability.Also applies to: 279-279
356-356
: LGTM with a suggestion for consistencyThe changes are appropriate and align with the PR objective. The update to
LabelsReaderDP
in the docstring and the addition ofpart_affinity_fields
to thekeep_keys
list are correct.For consistency with the
CentroidConfmapsPipeline
class, consider adding a brief comment explaining the purpose of thepart_affinity_fields
key in thekeep_keys
list. This would improve code readability and maintain a consistent documentation style across pipeline classes.Also applies to: 366-366
tests/inference/test_predictors.py (3)
87-93
: Approve the introduction ofmax_instances
variableThe introduction of the
max_instances
variable and its usage in the function call improves code readability and maintainability. This change allows for easier modification of the maximum instances value in the future.Consider moving the
max_instances
variable to the top of the function or even making it a constant at the module level if it's used in multiple test functions. This would further improve the code's organization and make it easier to adjust this value across multiple tests if needed.
97-99
: Approve the updated assertion usingmax_instances
The assertion has been correctly updated to use the
max_instances
variable, which aligns with the earlier changes. The use of<=
in the assertion allows for flexibility in the number of instances, up to the maximum.To improve clarity, consider adding a comment explaining why the number of centroids might be less than or equal to
max_instances
. This would help future readers understand the expected behavior more easily.
Line range hint
1-450
: Summary of changes and recommendationsThe changes in this file primarily involve adjustments to test parameters and expected outcomes for various predictor classes. Here are the key points that need attention:
The lowering of
peak_threshold
values in multiple tests may impact the number of detected instances. Verify that these changes align with the expected behavior of the predictors.The significant increase in the expected number of labels (from 100 to 1100) in the
test_single_instance_predictor
function needs clarification, especially considering the addition of thevideoreader_end_idx
parameter limiting frames to 100.The introduction of the
max_instances
variable improves code readability, but consider moving it to a more prominent location for better visibility.Some assertions have been updated to reflect new output structures or use new variables. Ensure these changes accurately represent the expected behavior of the predictors.
Overall, while the changes seem to be part of updating the tests to match new predictor behaviors, it's crucial to verify that all modifications align with the intended functionality and performance of the predictors. Consider adding comments to explain significant changes in expected outcomes or test parameters to improve code maintainability and clarity for future readers.
tests/training/test_model_trainer.py (2)
73-73
: Document new parameter and verify test setupA new parameter
minimal_instance_bottomup_ckpt: str
has been added to thetest_trainer
function. This parameter is likely used for weight loading verification.Please consider the following improvements:
- Add a docstring to explain the purpose and expected format of the
minimal_instance_bottomup_ckpt
parameter.- Ensure that the test setup (e.g., in
conftest.py
or similar) provides this checkpoint file correctly.Example docstring:
def test_trainer(config, tmp_path: str, minimal_instance_bottomup_ckpt: str): """ Test the ModelTrainer class. Args: config: The configuration object for the test. tmp_path (str): Temporary path for saving test outputs. minimal_instance_bottomup_ckpt (str): Path to a minimal instance bottom-up model checkpoint for weight verification. """ # ... (rest of the function)
280-297
: Approve weight loading verification with a suggestionThe addition of weight loading verification is a good practice to ensure model consistency between saved checkpoints and loaded models.
Consider increasing the precision of the weight comparison:
- assert np.all(np.abs(first_layer_ckpt - model_ckpt) < 1e-3) + assert np.allclose(first_layer_ckpt, model_ckpt, atol=1e-5, rtol=1e-5)This change uses
np.allclose()
with stricter tolerance levels, which can catch smaller discrepancies in weight loading.tests/inference/test_topdown.py (3)
85-85
: Simplify dictionary key checkInstead of using
key not in dict.keys()
, usekey not in dict
for a more idiomatic and efficient check.Apply this diff to simplify the key check:
-assert "instance_image" not in out.keys() +assert "instance_image" not in out🧰 Tools
🪛 Ruff
85-85: Use
key not in dict
instead ofkey not in dict.keys()
Remove
.keys()
(SIM118)
113-120
: Review commented-out code in data processingThe line
# ex["centroids"] = generate_centroids(ex["instances"], 0)
is commented out. If this code is no longer necessary, consider removing it. If it is needed, uncomment and ensure it functions correctly.
123-126
: Improve parameter alignment inCentroidCrop
For better readability, align the parameters in the
CentroidCrop
instantiation vertically.Apply this diff to adjust the alignment:
topdown_inf_layer = TopDownInferenceModel( - centroid_crop=CentroidCrop( - use_gt_centroids=True, anchor_ind=0, crop_hw=(160, 160) - ), + centroid_crop=CentroidCrop( + use_gt_centroids=True, + anchor_ind=0, + crop_hw=(160, 160) + ), instance_peaks=FindInstancePeaksGroundTruth(), )sleap_nn/data/providers.py (1)
Line range hint
155-186
: Refactor to eliminate code duplication in data processing.The logic for processing
LabeledFrame
objects into sample dictionaries is duplicated in bothLabelsReaderDP.__iter__
andLabelsReader.run
. Consider refactoring by utilizing the existingprocess_lf
function to reduce code redundancy and improve maintainability.Also applies to: 339-380
sleap_nn/training/model_trainer.py (3)
584-584
: Simplify dictionary iteration by removing.keys()
When iterating over dictionary keys, it's more efficient to use
for k in dict
instead offor k in dict.keys()
.Apply this diff to simplify:
ckpt["state_dict"] = { k: ckpt["state_dict"][k] - for k in ckpt["state_dict"].keys() + for k in ckpt["state_dict"] if ".head" not in k }🧰 Tools
🪛 Ruff
584-584: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
578-578
: Address the TODO: Handle different input channelsThere's a TODO comment indicating the need to handle different input channels. This is important for ensuring compatibility when loading pre-trained weights with models expecting different input dimensions.
Would you like assistance in implementing support for different input channels? I can help generate a solution or open a GitHub issue to track this task.
510-510
: UseOptional[str]
fortrained_ckpts_path
type annotationThe
trained_ckpts_path
parameter is currently annotated asstr = None
, which may cause confusion sinceNone
is not an instance ofstr
. To improve type correctness, consider usingOptional[str] = None
for parameters that can beNone
.Apply this diff to update the type annotations:
def __init__( self, config: OmegaConf, skeletons: Optional[List[sio.Skeleton]], model_type: str, - trained_ckpts_path: str = None, + trained_ckpts_path: Optional[str] = None, ):Apply similar changes in all occurrences where
trained_ckpts_path
is defined.Also applies to: 692-692, 765-765, 838-838, 911-911
sleap_nn/inference/predictors.py (3)
13-13
: Remove unused import 'litdata'The import
litdata
is not used in the code and can be safely removed to clean up unnecessary dependencies.🧰 Tools
🪛 Ruff
13-13:
litdata
imported but unusedRemove unused import:
litdata
(F401)
19-19
: Remove unused import 'apply_sizematcher'The function
apply_sizematcher
is imported but not used. Consider removing it to maintain code cleanliness.🧰 Tools
🪛 Ruff
19-19:
sleap_nn.data.resizing.apply_sizematcher
imported but unusedRemove unused import:
sleap_nn.data.resizing.apply_sizematcher
(F401)
27-28
: Remove unused imports 'generate_centroids' and 'generate_crops'The functions
generate_centroids
andgenerate_crops
are imported but not utilized in the code. Removing unused imports can help in reducing clutter.🧰 Tools
🪛 Ruff
27-27:
sleap_nn.data.instance_centroids.generate_centroids
imported but unusedRemove unused import:
sleap_nn.data.instance_centroids.generate_centroids
(F401)
28-28:
sleap_nn.data.instance_cropping.generate_crops
imported but unusedRemove unused import:
sleap_nn.data.instance_cropping.generate_crops
(F401)
sleap_nn/inference/topdown.py (1)
531-533
: Moveeval()
call outside the loopYou're calling
self.instance_peaks.eval()
inside the loop. To avoid redundant calls, move it outside the loop.Apply this diff to refactor the code:
+ self.instance_peaks.eval() for i in batch: - self.instance_peaks.eval() peaks_output.append(self.instance_peaks(i))
🛑 Comments failed to post (27)
tests/data/test_normalization.py (1)
9-9: 💡 Codebase verification
Incomplete removal of
LabelsReader
imports and usages.Several files still import and use
LabelsReader
, which prevents the complete removal of theIterDataPipe
dependency as intended by the PR. Please update the following files to replaceLabelsReader
withLabelsReaderDP
:
sleap_nn/inference/predictors.py
sleap_nn/data/resizing.py
tests/data/test_providers.py
tests/data/test_normalization.py
tests/data/test_instance_cropping.py
tests/data/test_instance_centroids.py
tests/data/test_edge_maps.py
tests/data/test_confmaps.py
tests/data/test_augmentation.py
tests/data/test_pipelines.py
🔗 Analysis chain
Approve import change and verify consistency.
The change from
LabelsReader
toLabelsReaderDP
aligns with the PR objective to remove IterDataPipe dependency. This likely implements the new thread-based approach.Let's verify if this change is consistent across the codebase:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any remaining imports of LabelsReader and usages of LabelsReaderDP # Search for any remaining imports of LabelsReader echo "Checking for remaining LabelsReader imports:" rg --type python "from sleap_nn.data.providers import LabelsReader" # Search for new imports of LabelsReaderDP echo "Checking for new LabelsReaderDP imports:" rg --type python "from sleap_nn.data.providers import LabelsReaderDP" # Search for any remaining usage of LabelsReader echo "Checking for remaining LabelsReader usage:" rg --type python "LabelsReader\." # Search for new usage of LabelsReaderDP echo "Checking for new LabelsReaderDP usage:" rg --type python "LabelsReaderDP\."Length of output: 470
Script:
#!/bin/bash # Description: Check for any remaining imports of LabelsReader and usages of LabelsReaderDP # Search for any remaining imports of LabelsReader echo "Checking for remaining LabelsReader imports:" rg -g "*.py" "from sleap_nn\.data\.providers import LabelsReader" # Search for new imports of LabelsReaderDP echo "Checking for new LabelsReaderDP imports:" rg -g "*.py" "from sleap_nn\.data\.providers import LabelsReaderDP" # Search for any remaining usage of LabelsReader echo "Checking for remaining LabelsReader usage:" rg -g "*.py" "LabelsReader\." # Search for new usage of LabelsReaderDP echo "Checking for new LabelsReaderDP usage:" rg -g "*.py" "LabelsReaderDP\."Length of output: 4193
tests/data/test_instance_cropping.py (1)
13-13: 💡 Codebase verification
Remove remaining imports of
LabelsReader
.The shell scripts indicate that
LabelsReader
is still imported in the following file:
sleap_nn/inference/predictors.py:from sleap_nn.data.providers import LabelsReader, VideoReader
Please update this import to
LabelsReaderDP
to align with the PR objectives.🔗 Analysis chain
Approve import change and verify its impact.
The import change from
LabelsReader
toLabelsReaderDP
aligns with the PR objective of removing IterDataPipe. This change appears to be part of the transition to a new system.To ensure this change is consistent across the codebase, please run the following script:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of LabelsReaderDP and absence of LabelsReader # Test 1: Check for any remaining imports of LabelsReader echo "Checking for remaining LabelsReader imports:" rg --type python 'from sleap_nn\.data\.providers import LabelsReader' # Test 2: Verify the correct import and usage of LabelsReaderDP echo "Verifying LabelsReaderDP usage:" rg --type python 'from sleap_nn\.data\.providers import LabelsReaderDP' rg --type python 'LabelsReaderDP\.from_filename'Length of output: 461
Script:
#!/bin/bash # Description: Verify the usage of LabelsReaderDP and absence of LabelsReader # Test 1: Check for any remaining imports of LabelsReader echo "Checking for remaining LabelsReader imports:" rg --type py 'from sleap_nn\.data\.providers import LabelsReader' # Test 2: Verify the correct import and usage of LabelsReaderDP echo "Verifying LabelsReaderDP usage:" rg --type py 'from sleap_nn\.data\.providers import LabelsReaderDP' rg --type py 'LabelsReaderDP\.from_filename'Length of output: 3724
tests/data/test_augmentation.py (1)
13-13: 💡 Codebase verification
Remaining
LabelsReader
Imports FoundThe following files still import
LabelsReader
and should be updated to useLabelsReaderDP
:
sleap_nn/inference/predictors.py
Please update these imports to ensure consistency and complete the removal of the IterDataPipe dependency.
🔗 Analysis chain
LGTM! Verify other imports across the codebase.
The import change from
LabelsReader
toLabelsReaderDP
aligns with the PR objective of removing the IterDataPipe dependency. This change is correct and follows Python conventions.To ensure consistency across the codebase, please run the following script to check for any remaining
LabelsReader
imports:If the script returns any results, those files may need to be updated to use
LabelsReaderDP
instead.
Verification script encountered errors. Please run the following corrected script to check for any remaining
LabelsReader
imports:
Verification script encountered errors. Please run the following corrected script to check for any remaining
LabelsReader
imports:🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for any remaining LabelsReader imports # Test: Search for LabelsReader imports rg --type python 'from sleap_nn.data.providers import LabelsReader'Length of output: 321
Script:
#!/bin/bash # Description: Check for any remaining LabelsReader imports # Test: Search for LabelsReader imports rg --type python 'from sleap_nn.data.providers import LabelsReader' || echo "No remaining LabelsReader imports found."Length of output: 191
Script:
#!/bin/bash # Description: Check for any remaining LabelsReader imports # Test: Search for LabelsReader imports using the correct file type rg --type py 'from sleap_nn.data.providers import LabelsReader' || echo "No remaining LabelsReader imports found."Length of output: 972
tests/data/test_resizing.py (1)
68-68: 🛠️ Refactor suggestion
Refactor ambiguous variable names file-wide.
The ambiguous variable name
l
is used consistently throughout the file. To improve code readability and maintainability, it's recommended to refactor all instances of this variable name.Would you like me to create a script to automate this change across the file? The script would replace all instances of
l = LabelsReaderDP.from_filename(...)
withlabels_reader = LabelsReaderDP.from_filename(...)
.🧰 Tools
🪛 Ruff
68-68: Ambiguous variable name:
l
(E741)
tests/data/test_confmaps.py (1)
22-22: 💡 Codebase verification
Issue: Remaining instances of
LabelsReader
detected.Several instances of
LabelsReader
remain in the codebase. Please update them toLabelsReaderDP
to ensure consistency and maintain compatibility.
tests/inference/test_predictors.py
tests/tracking/test_tracker.py
tests/tracking/candidates/test_local_queues.py
tests/tracking/candidates/test_fixed_window.py
tests/data/test_providers.py
sleap_nn/data/providers.py
sleap_nn/inference/predictors.py
sleap_nn/data/pipelines.py
🔗 Analysis chain
LGTM: Data pipeline initialization updated.
The change from
LabelsReader
toLabelsReaderDP
is consistent with the PR objectives. The function signature remains the same, which is good for maintaining compatibility.To ensure the new implementation is fully compatible, please run the following verification script:
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify that all occurrences of LabelsReader have been replaced with LabelsReaderDP # Test: Search for any remaining instances of LabelsReader rg --type python 'LabelsReader(?!DP)' -g '!tests/data/test_confmaps.py' # Note: The search excludes the current file to avoid false positives from the changes we're reviewing. # If this search returns results, it may indicate places where LabelsReader still needs to be updated.Length of output: 207
Script:
#!/bin/bash # Description: Verify that all occurrences of LabelsReader have been replaced with LabelsReaderDP # Test: Search for any remaining instances of LabelsReader not followed by DP rg --type python 'LabelsReader(?!DP)' --ignore-file '!tests/data/test_confmaps.py' # Note: The search excludes the current file to avoid false positives from the changes we're reviewing. # If this search returns results, it may indicate places where LabelsReader still needs to be updated.Length of output: 318
Script:
#!/bin/bash # Description: Verify that all occurrences of LabelsReader have been replaced with LabelsReaderDP # Test: Search for any remaining instances of LabelsReader not followed by DP using PCRE2 for look-around support rg --pcre2 --type python 'LabelsReader(?!DP)' --ignore-file '!tests/data/test_confmaps.py' # Note: The search excludes the current file to avoid false positives from the changes we're reviewing. # If this search returns results, it may indicate places where LabelsReader still needs to be updated.Length of output: 192
Script:
#!/bin/bash # Description: Verify that all occurrences of LabelsReader have been replaced with LabelsReaderDP # Test: Search for any remaining instances of LabelsReader not followed by DP using PCRE2 for look-around support rg --pcre2 'LabelsReader(?!DP)' --type py -g '!tests/data/test_confmaps.py' # Note: The search excludes the current file to avoid false positives from the changes we're reviewing. # If this search returns results, it may indicate places where LabelsReader still needs to be updated.Length of output: 4395
sleap_nn/data/resizing.py (1)
7-7: 💡 Codebase verification
IterDataPipe is still in use and should not be removed
The
IterDataPipe
is still imported and used in the following classes withinsleap_nn/data/resizing.py
:
Resizer(IterDataPipe)
PadToStride(IterDataPipe)
SizeMatcher(IterDataPipe)
If the intention is to remove
IterDataPipe
, please refactor these classes accordingly. Otherwise, consider retaining the import to maintain existing functionality.🔗 Analysis chain
Approve change, but question IterDataPipe usage
The change from
LabelsReader
toLabelsReaderDP
aligns with the PR objective of transitioning away from IterDataPipe. However, I noticed thatIterDataPipe
is still imported on line 9.Could you clarify if
IterDataPipe
is still needed in this file? If not, consider removing the import:-from torch.utils.data.datapipes.datapipe import IterDataPipe
To verify the usage of
IterDataPipe
, let's run the following script:This will help us determine if there are any classes still inheriting from
IterDataPipe
in this file.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Check for IterDataPipe usage in the file # Test: Search for IterDataPipe usage rg --type python 'class.*\(IterDataPipe\)' sleap_nn/data/resizing.pyLength of output: 187
Script:
#!/bin/bash # Description: Check for IterDataPipe usage in sleap_nn/data/resizing.py ast-grep --lang python --pattern 'class $_(IterDataPipe) { $$$ }' sleap_nn/data/resizing.pyLength of output: 546
Script:
#!/bin/bash # Description: Check for IterDataPipe usage in sleap_nn/data/resizing.py # Search for 'IterDataPipe' in the specified file grep -E 'IterDataPipe' sleap_nn/data/resizing.pyLength of output: 997
sleap_nn/data/pipelines.py (1)
255-255: 💡 Codebase verification
Incomplete Removal of
LabelsReader
: Multiple Instances Still PresentThe verification revealed that
LabelsReader
is still being used in several parts of the codebase:
Test Files:
tests/inference/test_predictors.py
tests/tracking/candidates/test_fixed_window.py
tests/tracking/test_tracker.py
tests/tracking/candidates/test_local_queues.py
tests/data/test_providers.py
Implementation Files:
sleap_nn/data/providers.py
sleap_nn/inference/predictors.py
Documentation:
sleap_nn/data/pipelines.py
(in docstrings)To fully align with the PR objective of removing the
IterDataPipe
dependency, please update all instances ofLabelsReader
toLabelsReaderDP
.🔗 Analysis chain
Summary: Changes align with PR objective and maintain consistency
The modifications to
CentroidConfmapsPipeline
andBottomUpPipeline
classes are consistent and appropriate:
- Both classes now use
LabelsReaderDP
instead ofLabelsReader
in their docstrings, aligning with the PR objective of removing IterDataPipe dependency.- New keys (
centroids_confidence_maps
andpart_affinity_fields
) are added to the respectivekeep_keys
lists, ensuring that the pipeline outputs include the necessary data.These changes appear to successfully update the data pipeline to work with the new data provider while maintaining the required functionality.
To ensure that these changes are consistent throughout the codebase, please run the following script:
This script will help identify any inconsistencies in the usage of
LabelsReaderDP
and the new keys across the codebase.Also applies to: 279-279, 356-356, 366-366
To address the issues found during script execution, please run the updated verification script below:
This updated script includes the
--pcre2
flag to support look-around in the firstrg
command and corrects the file type topy
in both commands. Running this should provide the necessary information to verify the changes effectively.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the usage of LabelsReaderDP and the presence of new keys in other pipeline classes. # Check for any remaining instances of LabelsReader echo "Checking for remaining instances of LabelsReader:" rg "LabelsReader(?!DP)" --type python # Check for consistent usage of new keys in keep_keys lists echo "Checking for consistent usage of new keys in keep_keys lists:" rg "keep_keys\s*=\s*\[" --type python -A 10Length of output: 592
Script:
# #!/bin/bash # Description: Verify the usage of LabelsReaderDP and the presence of new keys in other pipeline classes. # Check for any remaining instances of LabelsReader using PCRE2 for look-around echo "Checking for remaining instances of LabelsReader:" rg --pcre2 "LabelsReader(?!DP)" --type py # Check for consistent usage of new keys in keep_keys lists echo "Checking for consistent usage of new keys in keep_keys lists:" rg "keep_keys\s*=\s*\[" --type py -A 10Length of output: 7075
tests/data/test_pipelines.py (1)
566-566: 💡 Codebase verification
Issues Found: Incomplete Replacement of
LabelsReader
withLabelsReaderDP
.The following references to
LabelsReader
were found and need to be updated:
tests/tracking/test_tracker.py
:provider="LabelsReader"
tests/tracking/candidates/test_fixed_window.py
:provider="LabelsReader"
tests/tracking/candidates/test_local_queues.py
:provider="LabelsReader"
tests/inference/test_predictors.py
: Multiple instances ofprovider="LabelsReader"
and related commentstests/assets/minimal_instance_bottomup/training_config.yaml
:provider: LabelsReader
tests/assets/minimal_instance_centroid/training_config.yaml
:provider: LabelsReader
tests/data/test_providers.py
: Multiple instances including class instantiations and docstringstests/assets/minimal_instance/initial_config.yaml
:provider: LabelsReader
tests/assets/minimal_instance/training_config.yaml
:provider: LabelsReader
tests/assets/minimal_instance_centroid/initial_config.yaml
:provider: LabelsReader
tests/assets/minimal_instance_bottomup/initial_config.yaml
:provider: LabelsReader
Please ensure all instances of
LabelsReader
are replaced withLabelsReaderDP
to maintain consistency and functionality.🔗 Analysis chain
LGTM: LabelsReader consistently replaced with LabelsReaderDP.
These changes are consistent with the previous modifications to use LabelsReaderDP throughout the file. All constructor calls remain unchanged, indicating straightforward replacements.
To ensure the LabelsReaderDP implementation is complete and consistent, run the following script:
Also applies to: 610-610, 688-688, 768-768
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify the LabelsReaderDP implementation # Check for any remaining references to LabelsReader rg "LabelsReader(?!DP)" tests/ # Compare the method signatures of LabelsReader and LabelsReaderDP rg -U --multiline "class LabelsReader.*?:.*?\n.*?def" tests/ rg -U --multiline "class LabelsReaderDP.*?:.*?\n.*?def" tests/Length of output: 489
Script:
#!/bin/bash # Description: Verify the LabelsReaderDP implementation with corrected commands # Check for any remaining references to LabelsReader not followed by DP using PCRE2 rg --pcre2 "LabelsReader(?!DP)" tests/ # Compare the method signatures of LabelsReader and LabelsReaderDP using ast-grep ast-grep --lang python --pattern $'class LabelsReaderDP: $$$ def $$$' tests/ ast-grep --lang python --pattern $'class LabelsReader: $$$ def $$$' tests/Length of output: 2529
tests/inference/test_single_instance.py (2)
84-84:
⚠️ Potential issueSimplify membership test in assertion
Instead of checking
'key in dict.keys()'
, you can directly check'key in dict'
for a more concise and Pythonic approach.Apply this diff to fix the issue:
- assert "pred_confmaps" in outputs[0][0].keys() + assert "pred_confmaps" in outputs[0][0]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.assert "pred_confmaps" in outputs[0][0]
🧰 Tools
🪛 Ruff
84-84: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
71-72:
⚠️ Potential issueAvoid in-place modification of
ex["image"]
Modifying
ex["image"]
in place may affect other tests or assertions that rely on the original image. Consider creating a copy of the example or the image before resizing to prevent unintended side effects.Apply this diff to fix the issue:
- ex["image"] = resize_image(ex["image"], 0.5) + ex_resized = ex.copy() + ex_resized["image"] = resize_image(ex_resized["image"], 0.5) + outputs.append(find_peaks_layer(ex_resized))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.ex_resized = ex.copy() ex_resized["image"] = resize_image(ex_resized["image"], 0.5) outputs.append(find_peaks_layer(ex_resized))
tests/data/test_providers.py (5)
110-110:
⚠️ Potential issueUnused loop variable
i
should be replaced with_
The loop variable
i
is not used within the loop body. To indicate that it's intentionally unused, consider renaming it to_
.Apply this diff:
data = [] - for i in range(batch_size): + for _ in range(batch_size): frame = reader.frame_buffer.get()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.data = [] for _ in range(batch_size): frame = reader.frame_buffer.get()
🧰 Tools
🪛 Ruff
110-110: Loop control variable
i
not used within loop bodyRename unused
i
to_i
(B007)
142-142: 🛠️ Refactor suggestion
Avoid using bare
except:
statementsAs above, consider specifying the exception type to avoid catching unintended exceptions.
Apply this diff:
except: + except Exception: raise
Committable suggestion was skipped due to low confidence.
133-133:
⚠️ Potential issueUnused loop variable
i
should be replaced with_
Similarly, in this loop, the variable
i
is not used. Renaming it to_
improves code readability.Apply this diff:
data = [] - for i in range(batch_size): + for _ in range(batch_size): frame = reader.frame_buffer.get()📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.data = [] for _ in range(batch_size): frame = reader.frame_buffer.get()
🧰 Tools
🪛 Ruff
133-133: Loop control variable
i
not used within loop bodyRename unused
i
to_i
(B007)
119-119: 🛠️ Refactor suggestion
Avoid using bare
except:
statementsUsing a bare
except:
can catch unexpected exceptions, including system-exiting exceptions. It's better to catch specific exceptions or useexcept Exception:
to handle standard errors.Apply this diff:
except: + except Exception: raise
Committable suggestion was skipped due to low confidence.
18-18:
⚠️ Potential issueRename variable
l
to a more descriptive nameThe variable
l
in line 18 is a single-character variable name, which can be ambiguous and reduce code readability. It's recommended to use more descriptive variable names for better clarity.Apply this diff to rename the variable:
def test_providers(minimal_instance): """Test LabelsReaderDP module.""" - l = LabelsReaderDP.from_filename(minimal_instance) + labels_reader = LabelsReaderDP.from_filename(minimal_instance) sample = next(iter(labels_reader))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.labels_reader = LabelsReaderDP.from_filename(minimal_instance)
🧰 Tools
🪛 Ruff
18-18: Ambiguous variable name:
l
(E741)
tests/inference/test_topdown.py (4)
11-11:
⚠️ Potential issueRemove unused imports
Resizer
andPadToStride
The imports
Resizer
andPadToStride
fromsleap_nn.data.resizing
are not used in the code and should be removed to reduce clutter.Apply this diff to remove the unused imports:
-from sleap_nn.data.resizing import SizeMatcher, Resizer, PadToStride +from sleap_nn.data.resizing import SizeMatcher📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.from sleap_nn.data.resizing import SizeMatcher
🧰 Tools
🪛 Ruff
11-11:
sleap_nn.data.resizing.Resizer
imported but unusedRemove unused import
(F401)
11-11:
sleap_nn.data.resizing.PadToStride
imported but unusedRemove unused import
(F401)
8-8:
⚠️ Potential issueRemove unused import
resize_image
The import
resize_image
fromsleap_nn.data.resizing
is not utilized in the code and should be removed to clean up the imports.Apply this diff to remove the unused import:
-from sleap_nn.data.resizing import resize_image
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
🧰 Tools
🪛 Ruff
8-8:
sleap_nn.data.resizing.resize_image
imported but unusedRemove unused import:
sleap_nn.data.resizing.resize_image
(F401)
12-12:
⚠️ Potential issueRemove unused import
InstanceCropper
The import
InstanceCropper
fromsleap_nn.data.instance_cropping
is not utilized and should be removed to clean up the code.Apply this diff to remove the unused import:
-from sleap_nn.data.instance_cropping import InstanceCropper, generate_crops +from sleap_nn.data.instance_cropping import generate_crops📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.from sleap_nn.data.instance_cropping import generate_crops
🧰 Tools
🪛 Ruff
12-12:
sleap_nn.data.instance_cropping.InstanceCropper
imported but unusedRemove unused import:
sleap_nn.data.instance_cropping.InstanceCropper
(F401)
9-9:
⚠️ Potential issueRemove unused import
InstanceCentroidFinder
The import
InstanceCentroidFinder
fromsleap_nn.data.instance_centroids
is not used and can be removed to streamline the code.Apply this diff to remove the unused import:
-from sleap_nn.data.instance_centroids import InstanceCentroidFinder, generate_centroids +from sleap_nn.data.instance_centroids import generate_centroids📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.from sleap_nn.data.instance_centroids import generate_centroids
🧰 Tools
🪛 Ruff
9-9:
sleap_nn.data.instance_centroids.InstanceCentroidFinder
imported but unusedRemove unused import:
sleap_nn.data.instance_centroids.InstanceCentroidFinder
(F401)
sleap_nn/data/providers.py (3)
255-257:
⚠️ Potential issueCorrect the docstring to reflect the method functionality.
The
from_filename
method inVideoReader
is intended to load a video file, but the docstring incorrectly mentions a.slp
file, which is a labels file. Update the docstring to accurately describe the method's purpose.Apply this diff:
@classmethod def from_filename( cls, filename: str, queue_maxsize: int, start_idx: Optional[int] = None, end_idx: Optional[int] = None, ): - """Create VideoReader from a .slp filename.""" + """Create VideoReader from a video filename.""" video = sio.load_video(filename)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements."""Create VideoReader from a video filename.""" video = sio.load_video(filename) frame_buffer = Queue(maxsize=queue_maxsize)
383-385: 🛠️ Refactor suggestion
Use logging instead of print statements for error reporting.
Similarly, in
LabelsReader.run
, replacelogging.error
for consistent and configurable error logging.Apply this diff:
# At the top of the file, add: +import logging # Inside the exception handling block: except Exception as e: - print(f"Error when reading labelled frame. Stopping labels reader.\n{e}") + logging.error(f"Error when reading labelled frame. Stopping labels reader.\n{e}")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.import logging except Exception as e: logging.error(f"Error when reading labelled frame. Stopping labels reader.\n{e}")
280-282: 🛠️ Refactor suggestion
Use logging instead of print statements for error reporting.
In the exception handling of
VideoReader.run
, replacelogging
module for better logging practices and configurable error handling.Apply this diff:
# At the top of the file, add: +import logging # Inside the exception handling block: except Exception as e: - print(f"Error when reading video frame. Stopping video reader.\n{e}") + logging.error(f"Error when reading video frame. Stopping video reader.\n{e}")Committable suggestion was skipped due to low confidence.
sleap_nn/training/model_trainer.py (1)
579-588:
⚠️ Potential issueSecurity Risk: Unrestricted unpickling with
torch.load
can lead to code executionThe use of
torch.load(trained_ckpts_path)
without proper safeguards can pose a security risk. Unpickling data from untrusted sources can execute arbitrary code. Ensure thattrained_ckpts_path
comes from a trusted source, and consider using a more secure method for loading model weights.Additionally, it's important to add error handling to manage cases where the checkpoint file might be missing or corrupted.
Apply the following changes to enhance security and error handling:
+import pickle +import errno ... if trained_ckpts_path is not None: print(f"Loading weights from `{trained_ckpts_path}` ...") + if not os.path.isfile(trained_ckpts_path): + raise FileNotFoundError(errno.ENOENT, os.strerror(errno.ENOENT), trained_ckpts_path) try: - ckpt = torch.load(trained_ckpts_path) + with open(trained_ckpts_path, 'rb') as f: + ckpt = torch.load(f, map_location=self.device, pickle_module=pickle) except Exception as e: raise RuntimeError(f"Error loading checkpoint: {e}") ckpt["state_dict"] = { k: ckpt["state_dict"][k] for k in ckpt["state_dict"] if ".head" not in k } self.load_state_dict(ckpt["state_dict"], strict=False)Committable suggestion was skipped due to low confidence.
🧰 Tools
🪛 Ruff
584-584: Use
key in dict
instead ofkey in dict.keys()
Remove
.keys()
(SIM118)
sleap_nn/inference/predictors.py (3)
634-634:
⚠️ Potential issueUse
ValueError
instead ofException
for invalid providerRaising a
ValueError
is more appropriate when indicating an invalid argument value. It provides clearer information about the nature of the error.Apply this diff to improve error handling:
- raise Exception( + raise ValueError( "Provider not recognised. Please use either `LabelsReader` or `VideoReader` as provider" )Committable suggestion was skipped due to low confidence.
1244-1244:
⚠️ Potential issueUse
ValueError
instead ofException
for invalid providerSwitching to
ValueError
for invalid argument values aligns with Python's standard exception handling practices.Apply this diff to improve error handling:
- raise Exception( + raise ValueError( "Provider not recognised. Please use either `LabelsReader` or `VideoReader` as provider" )Committable suggestion was skipped due to low confidence.
305-305:
⚠️ Potential issueCall
self.pipeline.stop()
beforeself.pipeline.join()
To ensure proper termination of the pipeline threads, call
self.pipeline.stop()
before joining. This allows the pipeline to cease operations gracefully before the main thread waits for it to finish.Apply this diff to address the issue:
# At the end of _predict_generator method + self.pipeline.stop() self.pipeline.join()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.self.pipeline.stop() self.pipeline.join()
sleap_nn/inference/topdown.py (1)
518-523:
⚠️ Potential issueSimplify nested
if
statementsYou can simplify the nested
if
statements into a single condition.Apply this diff to streamline the code:
-if isinstance(self.instance_peaks, FindInstancePeaksGroundTruth): - if "instances" not in batch: - raise ValueError( - "Ground truth data was not detected... " - "Please load both models when predicting on non-ground-truth data." - ) +if isinstance(self.instance_peaks, FindInstancePeaksGroundTruth) and "instances" not in batch: + raise ValueError( + "Ground truth data was not detected... " + "Please load both models when predicting on non-ground-truth data." + )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.if isinstance(self.instance_peaks, FindInstancePeaksGroundTruth) and "instances" not in batch: raise ValueError( "Ground truth data was not detected... " "Please load both models when predicting on non-ground-truth data." )
🧰 Tools
🪛 Ruff
518-519: Use a single
if
statement instead of nestedif
statements(SIM102)
0da9ad2
to
80023b3
Compare
7a2cce7
to
b89af63
Compare
This PR removes the Iterdatapipe dependency from the inference pipeline. Currently, we use IterDatapipe modules for creating the data pipeline from labels while running inference on
.slp
files. Since we're moving to Litdata (Related issue: #80), we're removing the IterDatapipe modules and implementing athread-based
approach, similar to our currentVideoReader
(Related issue: #26).Summary by CodeRabbit
Release Notes
New Features
LabelsReaderDP
for enhanced data handling in pipelines.ModelTrainer
classes.CentroidCrop
class to utilize ground-truth centroids during inference.Bug Fixes
Documentation
Tests
LabelsReaderDP
and updated assertions based on new data handling logic.